Watcom is back!

eyebex^threekings

Abstract

Some of the older sceners among you (who miss the good old DOS times like I do :-) sure still remember the famous Watcom compiler. This article deals with its Open Source successor Open Watcom and explains why this compiler can still be the first choice for Windows intro development, where the size of the final executable is a big issue.

Introduction

Watcom, the original company behind the "Watcom C / C++ Compiler", has been bought by Powersoft back in 1994 / 1995. Powersoft itself was later acquired by Sybase. With the release of version 11.0b it was announced that further development of the compiler would be stopped. Many developers were quite upset, as Watcom was regarded as a very good compiler but some bugs in the 11.0x releases kept them from leaving behind their beloved 10.6 release. They at least wanted a last offical stable release. Nevertheless, Sybase did not seem to have any intention to release another official version, but fortunately the developer's calls were heard by others: Eventually, both Sybase and (most notably) SciTech started a joint effort and announced the "Open Watcom" project. After a long time of almost zero progress (internal license issues slowed down the whole development), a binary patch 11.0c was released for users who bought a previous Watcom version, which finally fixed a lot of bugs. And since late 2002, developers without an old Watcom license are able to download and use Open Watcom for free, although donations are welcome. Currently, Open Watcom 1.1 is the most recent version.

Warning: Do not install Open Watcom in a path that contains spaces, it will screw up the linker and other tools!

Hint: If this warning comes too late and you've installed Open Watcom on an NTFS formatted drive, you could create a junction point which does not contain any spaces to the original folder using this tool. That's just what I did.

Register calling conventions

After this little history, now what's so special about Open Watcom (OW for short)? First of all the IDE: It's special because it is complete bull****. There are two reasons why I'm saying this: Firstly it's true, secondly I want you to be prepared for that shock :-) Seriously: In general, don't let the first appearance of an IDE deceive you about the power of the underlying compiler system. Most coders will use their preferred editor and makefiles instead of an IDE anyway. The (positive :-) feature I like the most in OW is its simple but powerful way of specifying register calling conventions, the default calling convention in OW. "What is so cool about register calling conventions?", you may ask. Well, in general they create both smaller and faster code than putting your arguments on the stack! That's because building the stack frame can be omitted and memory is less often (if at all) accessed. I'm currently writing my first 4k intro, that's why in all my upcoming examples I'll try to optimize for size and not for speed. Suppose you have a function like:

void memclr(size_t bytes,void *memory) {
  __asm {
    mov al,0
    cld
    mov ecx,[bytes]
    mov edi,[memory]
    rep stosb
  }
}

The Microsoft Visual Studio .NET 2002 C++ compiler (MSVC for short), which I'll take as a reference here, compiles this (using the "Minimize Size" switch /O1) to:

LEA EAX,DWORD PTR SS:[EBP-30]
PUSH EAX
PUSH 28
CALL memclr

...

memclr:
PUSH EDI
MOV AL,0
CLD
MOV ECX,DWORD PTR SS:[ESP+8]
MOV EDI,DWORD PTR SS:[ESP+C]
REP STOS BYTE PTR ES:[EDI]
POP EDI
RETN

...

POP ECX
POP ECX

Of course, even with MSVC this can be further optimized. The "__declspec(naked)" modifier forces MSVC not to generate any prolog / epilog code (no code to save modified registers, not even a return statement) whereas "__fastcall" passes the first two DWORD or smaller sized arguments in the ECX and EDX registers. Further arguments are passed via the stack. Now the function looks like this:

__declspec(naked) void __fastcall memclr(size_t bytes,void *memory) {
  __asm {
    pushad
    mov al,0
    cld
    mov edi,edx
    rep stosb
    popad
    ret
  }
}

Luckily, the "bytes" argument will be passed in the ECX register just where we need it. The "memory" pointer needs to be moved from EDX to EDI, though. See what code MSVC generates:

PUSH 28
LEA EDX,DWORD PTR SS:[EBP-30]
POP ECX
CALL memclr

...

memclr:
PUSHAD
MOV AL,0
CLD
MOV EDI,EDX
REP STOS BYTE PTR ES:[EDI]
POPAD
RETN

Note there is no code necessary to clean up the stack! The generated code size is 21 bytes vs. 29 bytes from the previous example. Not bad, but we can go further. This is where OW appears on the scene: It allows us to specify explicitly the registers we want arguments to be passed in and allows to specify modified registers, so the compiler may decide whether it's actually necessary to save any register contents. The code for OW looks like this:

void memclr(size_t bytes,void *memory);
#pragma aux memclr parm [ecx] [edi];

...

__declspec(naked) void memclr(size_t bytes,void *memory) {
  __asm {
    mov al,0
    cld
    rep stosb
    ret
  }
}

An auxiliary pragma directive in the header file is used to notify OW of our needs. As you probably have guessed, "parm" is followed by the list of registers which should take the values corresponding to the arguments as specified in the function declaration. Normally, I'd also have to specify a "modify" attribute in order to tell OW which registers are altered, but registers used to pass arguments as well as the EAX register are assumed to be modified anyway (this assumption can be overridden with an additional "exact" attribute).

Note: The "__fastcall" keyword is "supported" in OW, but has no meaning. That is, it won't make OW pass arguments in the ECX or EDX registers, it simply does nothing. This is a big pitfall when porting code that contains inline assembly from MSVC to OW. A droll remark from the OW manual: The _fastcall and __fastcall keywords are scanned but ignored since they refer to a particular Microsoft code generation technique. Open Watcom's generated code is always "fast". Now that's nice, isn't it? :-)

OW compiles this code (using the "Space optimizations" switch -os) to:

LEA EDI,DWORD PTR SS:[EBP-2C]
MOV ECX,28
CALL memclr

...

memclr:
MOV AL,0
CLD
REP STOS BYTE PTR ES:[EDI]
RETN

The generated code size is 19 bytes now, and it would be 17 bytes if OW had replaced "MOV ECX,28" with a PUSH / POP sequence. Surely the size depends on the current register usage (because this has influence on which register contents the compiler needs to preserve) and is hardly ever the same, but this is just an example. Believe me, in general you're better off using register calling conventions.

Intrinsics

We still can go one step further: For such small functions like the one above it might be wise to implement them as intrinsics (or "inline functions"). In the previous example, the call instruction takes 5 bytes, which is just as much as the code size of whole function body (not counting the return statement). So implementing it as an intrinsic at least does not hurt and maybe an executable packer used later is even grateful about the repeating bytes. In MSVC, the keyword "__forceinline" would archive this, but guess what, this does not work correctly in conjunction with the "__fastcall" keyword, neither can it be used together with the "__declspec(naked)" modifier! Be surprised how easy intrinsics are with OW. This code goes to the header file:

void memclr(size_t bytes,void *memory);
#pragma aux memclr = \
  "mov al,0" \
  "cld" \
  "rep stosb" \
  parm [ecx] [edi];

I don't think it can be any easier. But there's yet another cool feature to come! Think about this: When you're linking without the C runtime to make your intro really small, you'll have to implement simple functions like "sin" yourself. "Not a big issue", you may think, "simply use the fsin FPU instruction, you moron!". Yeah, right. But how to do the C interface? As we have learned above, passing that single argument of type "double" that specifies the angle via the stack is not only drop-dead uncool but also absolute overkill for that single instruction function. The surprise is: Despite MSVC with its "__fastcall" keyword, which can only pass arguments in general purpose registers, OW can also pass arguments directly in the FPU registers (which actually are a stack)! So implementing a "sin" function is no more than:

double sin(double angle);
#pragma aux sin parm [8087] modify [8087] value [8087] = "fsin";

Heck, that's brilliant, don't you think so? You simply have zero overhead to hand-written assembly. Don't forget to compile your source with the "fpi" or "fpi87" option to make this code work correctly. The "8087" in the "parm" attribute means the argument is pushed onto the FPU stack, resulting in the last pushed argument to reside in ST(0). The "8087" in the modify attribute means that all FPU registers (not just ST(0)) might be modfied by the function. I haven't mentioned the "value" attribute so far, but you surely guessed already it specifies the register which holds the function's return value. Here, "8087" tells the compiler that the result is to find in ST(0). Back in the good old DOS days, the guys from Cubic Team &emp; $een (Hi Submissive, I guess you don't remember me, do you?) have released a nice header file for Watcom which defines all kinds of needful math functions.

Final words

There are other nice features of OW which almost all are realized by using auxiliary pragmas, but mentioning all of them would definitely got beyond this article. Simply download OW and take a look at the documentation, it's quite good as it used to be a commercial product! Just remember my words above and don't install OW in a path that contains spaces. A few words to those guys who wonder why I compared OW to MSVC instead of e.g. GCC / MinGW: It's true that GCC's abilities considering inline assembly (with register calling conventions) and intrinsic functions are better than MSVC's, but they are still worse than OW's (from what I know, GCC doesn't allow you to pass arguments in FPU registers, for example). Additionally, as I mentioned before, I'm currently trying to write my first 4k intro, and I did a little comparision about the executable file sizes produced by these three compilers (read "linkers", as it's mainly the linker that influences the file size), and guess what: GCC ist worst, then comes OW, closely followed by MSVC which is best. And since I simply hate some of the strange syntax GCC requires, I focussed my attention to the two best compilers.

Contact information

I'd be interessted if you could make any use of this article, whether you found it too long or too short, too easy or too advanced, simply boring or anything else. You can write to me in English or German, whichever you prefer, to the following address:

eyebex ^ threekings